Random Forests



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
import pandas as pd
from sklearn.model_selection import train_test_split



In [ ]:

    
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()



In [ ]:

    
from sklearn.ensemble import RandomForestClassifier
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, stratify=cancer.target, random_state=1)
rf = RandomForestClassifier(n_estimators=100).fit(X_train, y_train)



In [ ]:

    
rf.feature_importances_



In [ ]:

    
pd.Series(rf.feature_importances_,
          index=cancer.feature_names).plot(kind="barh")

Exercise

Use a random forest classifier or random forest regressor on a dataset of your choice. Try different values of n_estimators and max_depth and see how they impact performance and runtime. Tune max_features with GridSearchCV.